-
Notifications
You must be signed in to change notification settings - Fork 248
feat: support concat for strings
#2604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Depends on #2586 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2604 +/- ##
============================================
+ Coverage 56.12% 59.22% +3.10%
- Complexity 976 1448 +472
============================================
Files 119 147 +28
Lines 11743 13762 +2019
Branches 2251 2365 +114
============================================
+ Hits 6591 8151 +1560
- Misses 4012 4387 +375
- Partials 1140 1224 +84 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@andygrove please take another look. |
| classOf[BitLength] -> CometScalarFunction("bit_length"), | ||
| classOf[Chr] -> CometScalarFunction("char"), | ||
| classOf[ConcatWs] -> CometScalarFunction("concat_ws"), | ||
| classOf[Concat] -> CometScalarFunction("concat"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need type checks so that we fall back to Spark for unsupported argument types?
Perhaps something like this?
object CometConcat extends CometScalarFunction[Concat]("concat") {
override def getSupportLevel(expr: Concat): SupportLevel = {
if (expr.children.forall(_.dataType == DataTypes.StringType)) {
Compatible()
} else {
Incompatible(Some("Only string arguments are supported"))
}
}
}| createFunctionWithInputTypes( | ||
| "concat", | ||
| Seq(SparkStringType, SparkStringType) | ||
| ), // TODO: variadic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this PR is just to support string inputs in Comet concat, but the fuzz tester should ideally test for all types that Spark supports
| +- CometHashAggregate (67) | ||
| +- CometExpand (66) | ||
| +- CometUnion (65) | ||
| :- CometHashAggregate (22) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this hash aggregate now supported in Comet? I don't see concat used in the query.
https://github.com/apache/datafusion-benchmarks/blob/main/tpcds/queries-spark/q5.sql
| // https://github.com/apache/datafusion-comet/issues/2647 | ||
| ignore("test concat function - arrays") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you enable these tests and use the recently added checkSparkAnswerAndFallbackReason method to make sure we are correctly falling back to Spark?
|
@andygrove PTAL |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @comphead!
|
In case anyone is wondering why more operators are now running natively, even though the TPC-DS queries do not contain |
concat exists in the query, no?: 'store' || s_store_id as idbut spark just move it to aggregate avoid copying more data perhaps? |
|
:) tadaam @rluvaton thats' true, I totally forgot about |
Which issue does this PR close?
Closes #2552 .
Rationale for this change
What changes are included in this PR?
How are these changes tested?